Structural Catalytic Core of the Members of the Superfamily of Acid Proteases

The superfamily of acid proteases has two catalytic aspartates for proteolysis of their peptide substrates. Here, we show a minimal structural scaffold, the structural catalytic core (SCC), which is conserved within each family of acid proteases, but varies between families, and thus can serve as a structural marker of four individual protease families. The SCC is a dimer of several structural blocks, such as the DD-link, D-loop, and G-loop, around two catalytic aspartates in each protease subunit or an individual chain. A dimer made of two (D-loop + DD-link) structural elements makes a DD-zone, and the D-loop + G-loop combination makes a psi-loop. These structural markers are useful for protein comparison, structure identification, protein family separation, and protein engineering.


Introduction
Earlier, we described structural catalytic cores in many serine and cysteine proteases and showed the presence of unique structure/functional environments, "zones", around the catalytic sites in these proteins [1][2][3][4].Each zone incorporated a segment of the catalytic core, connected to their respective element of protein functional machinery through a network of conserved hydrogen bonds and other interactions.
The four protease superfamilies studied earlier were (1) alpha/beta-hydrolases, (2) trypsin-like serine proteases, (3) cysteine proteinases, and (4) SGNH hydrolase-like proteins (SCOP (Structural Classification of Proteins, https://scop.mrc-lmb.cam.ac.uk/; accessed on 1 March 2024 [5]) IDs: 3000102, 3000114, 3001808, and 3001315, respectively).Each had only rare, structural exceptions, where aspartic acid could be found in place of the canonical catalytic serine or cysteine residues.At the same time, most of the proteases that predominantly use aspartic acid as a catalytic residue are grouped into the "acid proteases" superfamily (SCOP ID: 3001059).This superfamily belongs to the "all beta proteins" class (SCOP ID: 1000001) and includes four families, including the "pepsin-like" family (SCOP ID: 4002301).The 3D structure of a protein from the pepsin-like family consists of two similar beta barrel domains (N-and C-terminal) with one catalytic aspartate residue in each domain [6][7][8].Aspartic proteases of this family use an activated water molecule bound to two conserved aspartate residues for hydrolysis of their peptide substrates.Enzymes of the pepsin-like family are synthesized as inactive zymogens (proenzymes), and later they are subsequently activated by cleavage of the N-terminal propeptide, and separate upon activation [9].The protease 3D structures of the other three families resemble that of one of the structural domains of the peptidase from the "pepsin-like" family, and they become active when two monomers assemble to form the catalytically active dimer [10].
Here, we propose a general model of the conserved structural catalytic core (SCC) of aspartate proteases.Based on the "key" features of this model, we present a comparative structural analysis of 3D structures of superfamily representative domains in their zymogenic, free, and ligand-bound forms found in the Protein Data Bank (PDB [11,12]).In addition, we show a comparative structural analysis of SCC models obtained after dimerization of two identical amino acid chains of proteases or duplication of corresponding amino acid fragments within the same chain.Certain elements of catalytic mechanism are discussed only to highlight the role of shown residues, but the complete protein functional analysis is not within the scope of this manuscript.

Characterization of the Structural Catalytic Core of the Members of the Superfamily of Acid Proteases 2.1. Creating the Dataset of the Acid Proteases Superfamily Fold Proteins
The SCOP classification database [5] and the Protein Data Bank (PDB, http://www.rcsb.org/;accessed on 1 March 2024 [11,12]) were used to identify and retrieve 33 representative structures of proteins from the acid protease superfamily (SCOP ID: 3001059).Detailed descriptions of the protein structural information contained within this set of PDB files are given below.
Representative 3D structures of this superfamily are tabulated in Table 1.Of the four families, only the pepsin-like family contains 3D structures of the zymogenic form of aspartic proteases.In addition to the SCOP database, we used data from the Proteopedia and the Uniprot databases (http://proteopedia.org/wiki/index.php/Main_Page;accessed on 1 March 2024 [19,20] and https://www.uniprot.org/;accessed on 1 March 2024 [21], respectively).Ten proenzyme structures were identified, and they are indicated with a "p" in Table 1.Since each 3D structure of the pepsin-like proenzymes contained two similar domains, both domains were separately analyzed at their catalytic regions, and thus Table 1 contains two lines for each PDB ID of a proenzyme labeled as "a" and "b".For four proteins out of ten, in addition to coordinates of the zymogenic form, there were also available coordinates for both the ligand-free and ligand-bound forms, labeled in Table 1 with letters "c/d" and "e/f", respectively.For three out of ten proteins, in addition to the coordinates of the zymogenic form, there were coordinates of only the ligand-bound form (i.e., "a", "b", "e", and "f" only; rows N: 4, 6, and 7).And for the remaining three proteins, there were coordinates available only for the zymogenic form (i.e., "a" and "b" only; rows N: 8-10).In addition to these ten proteases from the pepsin-like family, three proteolytically nonfunctional proteins in one or two forms were also analyzed (rows N: 11-13).The proteolytic inactivity of the last three proteins is caused by the replacement of their catalytic aspartic acids in the C-domains with serine.
In SCOP, the retroviral protease (retropepsin) family is represented by the 3D structures of proteases from ten different organisms: HIV-1, HIV-2, HTLV-1, M-PMV, FIV, XMRV, SIV, RSV, MAV, and EIAV [5].Of the ten proteases listed, only the 3D structure of the XMRV protease differs from that of the other retropepsins [22,23].Therefore, only the 3D structures of HIV-1 and XMRV proteases in the free and ligand-bound forms were chosen for analysis (Table 1, rows 14 and 15).
The dimeric aspartyl protease family contains seven representative protein 3D structures [5].Six of the seven representative proteins are homologues of the DNA damageinducible protein 1 (Ddi1) protease (PDB ID: 4Z2Z) [24].The fold of the seventh representative protein, RC1339/APRc from Rickettsia conorii (PDB ID: 5C9F), does not form the mandatory homodimer like all other proteins in the dimeric aspartyl protease family [25].Therefore, two 3D structures from this family, Ddi1 and APRc, were taken for conformational analysis.Finally, the Lpg0085-like family contains only one representative 3D structure (PDB ID: 2PMA) [26] and it was included in the analysis.

Structural Catalytic Core around the Catalytic Aspartates in Pepsin
Let us consider three variants of the pepsin 3D structure: the zymogenic propepsin (PDB ID: 3PSG), free pepsin (PDB ID: 4PEP), and ligand-bound pepsin (PDB ID: 6XCZ), which structurally define the pepsin-like family (SCOP ID: 4000470) (Table 1, rows 1a-1f).The boundary between the N-and C-domains of the 3D structure of pepsinogen is in the vicinity of Gly 169 [9].Asp 32 (N-domain) and Asp 215 (C-domain) are the two catalytically important aspartate residues.Each aspartate residue is positioned within the hallmark Asp-Thr/Ser-Gly (Asp 32 -Thr 33 -Gly 34 in 3PSG) motif which, together with a further Hydrophobic-Hydrophobic-Gly sequence motif, forms an essential structural feature known as a psi-loop motif [28,[50][51][52][53]. Let us designate two fragments of the protease amino acid sequence involved in formation of the psi-loop motif as the D(Asp)-loop and G(Gly)-loop.In this section, the atomic structure of the D-and G-loops in the N-and C-domains and their position relative to each other in the 3D structures of pepsin will be analyzed in detail.

Propepsin DD-Zone of Propepsin
As noted above, the functional activity of pepsin is carried out simultaneously by both of the catalytic residues, Asp 32 and Asp 215 .Therefore, two D-loops, D-loop N for the N-terminal domain and D-loop C for the C-terminal domain, were analyzed in detail (Tables 1 and S1).It was found that the two domains of propepsin also contain structurally equivalent short peptides, which we call DD-link N (Asp 11 -...-Tyr 14 ) and DD-link C (Gly 188 -Tyr 189 -Trp 190 ), where N and C also stand for the N-terminal domain and C-terminal domain, respectively (Table 1).These two special DD-link peptides "lock" the ends of the Dloop N and D-loop C to form a "circular" structure, which altogether we call the "DD-zone" (Figure 1A).
Tyr189-Trp190), where N and C also stand for the N-terminal domain and C-terminal domain, respectively (Table 1).These two special DD-link peptides "lock" the ends of the D-loopN and D-loopC to form a "circular" structure, which altogether we call the "DD-zone" (Figure 1A).The DD-zone of propepsin consists of 19 amino acids in total from both D-loops and both DD-links and an additional residue Tyr125.Tyr125 serves as a structural mediator between the C-terminus of the D-loopN and the N-terminus of the DD-linkC (Figure 1A); this residue directly follows Ala124 from G-loopN (Table 1).The DD-zone of propepsin consists of 19 amino acids in total from both D-loops and both DD-links and an additional residue Tyr 125 .Tyr 125 serves as a structural mediator between the C-terminus of the D-loop N and the N-terminus of the DD-link C (Figure 1A); this residue directly follows Ala 124 from G-loop N (Table 1).
Independently, in propepsin, residues Thr 33 and Thr 216 are located next to the two catalytic aspartates.Their side-chain OG1 atoms each make two hydrogen bonds with main-chain nitrogen and oxygen atoms of the opposite D-loop (Figure 1A, Table S1, last column).These interactions are known as the "fireman's grip" motif [54,55].
The proenzyme segment in propepsin is Leu 1p -...-Leu 44p , where "p" indicates the proenzyme sequence region.The pepsin portion in 3PSG starts from Ile 1 .Glu 13 and Phe 15 form a short β-sheet-like interaction with Lys 9p and Val 7p (Figure 1A, Table S2, last column).The residues of this β-sheet undergo a conformational change during the activation process [9].
The Psi-Loop N and Psi-Loop C Motifs: Interactions between the D-Loop and G-Loop in the N-and C-Domains In 3PSG, the D-loop N tetrapeptide, Asp 32 -...-Ser 35, contains a frequently occurring Asx-motif [56], where an aspartate (here, catalytic Asp 32 ) or an asparagine residue within a tetra-or pentapeptide forms two short-range (in terms of sequence location) main-chain and side-chain hydrogen bonds with the sequentially adjacent amino acids (Figure 1B).We observe a similar Asx-motif involving the catalytic Asp 215 from the D-loop C tetrapeptide (Figure 1C).Additionally, there are four conserved long-range hydrogen bonds between the D-and G-loops in both N-and C-domains (Figure 1B,C).We will refer to the substructures shown in Figure 1B,C as the psi-loop N and psi-loop C motifs.Each psi-loop motif is an eight-residue 3D structure consisting of D-and G-loop residues that are held together by six hydrogen bonds.The geometric characteristics of these six hydrogen bonds are given in Table S2 (row 1a, columns 4-6).
Comparison of the Psi-Loop N and Psi-Loop C Despite the apparent similarity, the psi-loop N and psi-loop C motifs are not identical.While making similar interactions, the D-loop C is five amino acids long (Asp 215 -...-Ser 219 ) and the D-loop N has only four residues (Figure 1B,C).Moreover, the conformations of the two respective G-loops differ.The G-loop C at its C-terminus contains a β-turn, which is stabilized by the hydrogen bond between O/Gly 302 and N/Phe 305 , while the G-loop N does not have a similar substructure.As a result, there is conformational difference between Phe 305 and its structural counterpart in the N-domain, Tyr 125 , where Phe 305 takes part in the conformational arrangement of its respective psi-loop, while Tyr 125 does not.Still, the two psi-loop motifs are bound by a set of equivalent interactions, where the O/Asp 32 -N/Leu 123 hydrogen bond in psi-loop N is substituted by the O/Thr 218 -N/Asp 303 hydrogen bond in psi-loop C , and where the O/Ser 35 -N/Ala 124 hydrogen bond in psi-loop N is substituted by the O/Ser 219 -N/Val 304 hydrogen bond in psi-loop C (Figure 1B,C).
The structural changes described above appear to result in tighter binding of Asp 32 to the G-loop N than of Asp 215 to G-loop C , since the distance from Asp 32 to G-loop N is shorter than that from Asp 215 to G-loop C .It is possible that this structural fact is the main reason for the differences in functional activity between Asp 32 and Asp 215 in the proposed models of catalytic hydrolysis of peptide bonds by acid proteases [57][58][59].If Asp 32 is more tightly bound with more potential hydrogen bonds as compared to Asp 215 , then its nucleophilicity must be somewhat decreased.Thus, Asp 215 of the C-domain would play a more prominent role in the proteolytic cleavage of dipeptide substrates than Asp 32 of the N-domain.
The structural association of two psi-loops and the DD-zone allows us to obtain an assembly of structural elements of the structural catalytic core (SCC) of propepsin (Figure 2A).It includes all 28 amino acids listed in Table 1 (rows 1a and 1b).

Activation of Free Pepsin
The conversion of propepsin to active pepsin is achieved through proteolytic cleavage and subsequent removal of the N-terminal amino acid fragment.Here, we are mostly interested in changes that occur in the propepsin structural core, SCC.A structural comparison of propepsin (PDB ID: 3PSG) and mature pepsin (PDB ID: 4PEP) showed that rearrangements occur only in DD-link N and its immediate environment.First, as described above, the length of the tetrapeptide Asp 11 -...-Tyr 14 was reduced by one residue at its N-terminus (Tables 1 and S1).Then, the two-stranded β-sheet (Glu 13 -...-Phe 15 )/(Val 7p -...-Lys 9p ) is replaced with a structurally similar two-stranded β-sheet (Glu 13 -...-Phe 15 )/(Glu 7 -...-Tyr 9 ) (Tables 1 and S2).Thus, upon pepsin activation the architecture of the SCC remains largely unchanged.

Pepsin/Ligand Complex
During activation, the propepsin structure transforms into the active pepsin structure, ligand-free form.How does interaction with the ligand affect the SCC?Let us consider the 3D structure of the pepsin/saquinavir complex (PDB ID: 6XCZ).The key contacts between pepsin and the small-molecule ligand (saquinavir, ROC 401 ) are four hydrogen bonds (Figure 2B; Table S3, rows 1e and 1f).Two pairs of conserved residues from the Dloops of the N-and C-domains, Asp 32 /Gly 34 and Asp 215 /Gly 217 , donate four oxygen atoms as part of the four hydrogen bonds.Each of the two aspartates forms an Asx-motif [56], and in addition to the four hydrogen bonds above, there are two additional hydrogen bonds via the mediator-waters HOH 527 and HOH 645 (Figure 2B), and also there is a hydrogen bond that involves the OH atom of Tyr 189 , the central residue of the tripeptide DD-link C .Thus, DD-link C interacts with the inhibitor.Aside from the extensive hydrogen bonding inventory described above, binding of a ligand does not introduce any visible structural changes to the ligand-free form of the SCC of pepsin (Tables S1 and S2, rows 1c-1f).
The location of the structural catalytic core (SCC) in the 3D structure of propepsin is shown in Figure 3.
Molecules 2024, 29, x FOR PEER REVIEW 2.2.3.Pepsin/Ligand Complex During activation, the propepsin structure transforms into the active pepsin ture, ligand-free form.How does interaction with the ligand affect the SCC?Let sider the 3D structure of the pepsin/saquinavir complex (PDB ID: 6XCZ).The key c between pepsin and the small-molecule ligand (saquinavir, ROC401) are four hy bonds (Figure 2B; Table S3, rows 1e and 1f).Two pairs of conserved residues from loops of the N-and C-domains, Asp32/Gly34 and Asp215/Gly217, donate four oxygen as part of the four hydrogen bonds.Each of the two aspartates forms an Asx-mo and in addition to the four hydrogen bonds above, there are two additional hy bonds via the mediator-waters HOH527 and HOH645 (Figure 2B), and also there is a gen bond that involves the OH atom of Tyr189, the central residue of the tripepti linkC.Thus, DD-linkC interacts with the inhibitor.Aside from the extensive hy bonding inventory described above, binding of a ligand does not introduce any structural changes to the ligand-free form of the SCC of pepsin (Tables S1 and S 1c-1f).
The location of the structural catalytic core (SCC) in the 3D structure of prope shown in Figure 3.

Structural Core in Proteins of the Pepsin-like Family 2.3.1. DD-Zones
Earlier, we showed that in propepsin the segment Asp 11 -Phe 15 , which includes DDlink N , interacts with the pro-tripeptide Val 7p -Lys 9p (Figure 1A) by means of interactions listed in Table S2.During the transition from the inactive zymogenic form to the enzymatically active form, DD-link N is slightly structurally modified as described above, and the pro-tripeptide is spatially substituted by the N-terminal tripeptide (Glu 7 -Tyr 9 ; Table 1).Interactions between DD-link N and the N-terminal tripeptide are shown in Table S2.We also observed similar structural rearrangements in the other members of the pepsin-like family although there are variations from the rule: with the histo-aspartic protease (HAP), DD-link N is one amino acid longer, and with procathepsin E, only one amino acid, R 9P , of the propeptide, contacts DD-link N (Table 1).However, the general structural trend for the pepsin-like family is the same.
In propepsin and pepsin, the contact between DD-link N and D-loop N involves a water molecule as an intermediary (Figure 1; Table S1).In the structure of ligand-bound pepsin, a water molecule does not participate in interactions as an intermediary.A similar water presence and functionality is observed for all of the remaining proteins of the pepsin-like family.However, considering differences in the resolution of structures (Table 1) and the associated difficulties in localization of the bound water molecules, it is not always possible to unambiguously correlate the presence or absence of a water molecule with any form of protein, and thus exceptions are possible.
In pepsin, the contact between D-loop N and DD-link C involves the amino acid Tyr 125 as a structural mediator (Figure 1; Table S1).In a number of proteins, there is also a mediating water molecule in addition to the aromatic amino acid (Table S1, column 5).In three proteins, xylanase inhibitor, basic 7S globulin, and EDGP, there are two mediator residues instead of a single Tyr 125 .A hydrogen bond between the ends of DD-link C and D-loop C is, however, conserved and contains no mediator insertions in any of the analyzed structures (Table S1, column 6).The contact between D-loop C and DD-link N does not contain mediators, but can be variable in its nature, being a hydrogen bond, a weak hydrogen bond, or a hydrophobic interaction (Table S1, column 7).

Fireman's Grip Motif Reflects Open/Close-Conformation Structural Change
In the pepsin-like family proteins, the open/close-conformation structural change during the transition from the inactive zymogen to the enzymatically active form can either lead to conformational changes in the DD-zone or not.In proteins, where the hallmark Asp-Thr/Ser-Gly sequence (see Section 2.2) in the C-terminal domain contains serine, the conformational change in the DD-zone does take place, and it is reflected by the change of the fireman's grip motif (Table S1, column 8).In proteins, where the hallmark Asp-Thr/Ser-Gly sequence in the C-terminal domain contains threonine, the open/close conformational change in the DD-zone does not take place.

Psi-Loops
As noted above, the psi-loop motif includes amino acids from the D-and G-loops.In pepsin, both D-loops contain a catalytic aspartate.Of the thirteen proteins studied, eight are active hydrolases, and have both catalytic aspartates (Table 1).In the HAP protein, an evolutionary Asp 32 His mutation did occur, which, however, did not lead to a loss of catalytic activity because the other Asp 215 was still present [36].The remaining four proteins, cathepsin D, xylanase inhibitor, basic 7S globulin, and EDGP, lost their enzymatic activity due to the replacement of the catalytic aspartate with another amino acid in the C-terminal domain [37,[43][44][45].Loss of catalytic activity in these proteins versus the HAP protein is strong evidence that proteolytic activity requires the aspartate of the C-terminal domain, whereas the aspartate of the N-terminal domain may be dispensable.
Both psi-loop N and psi-loop C motifs are structurally identical among the thirteen proteins of the pepsin-like family in three different forms (proenzyme, mature enzyme, and enzyme/ligand complex) (Table S2, columns 4 and 5).That is, replacing the catalytic aspartate with another amino acid either does not affect the conformation of the psi-loop motifs or affects it insignificantly.Structural conservation of the psi-loop conformation also occurs despite structural rearrangement in the tetrapeptides forming the Asx-motif in some proteins (Table S2, column 6).For example, six proteins in one or several forms show a structural transition from the Asx-motif to a Asx-turn [60], which lacks the hydrogen bond between the atoms of the first and fourth residues of the tetrapeptide unlike the Asx-motif.The structures of these six proteins, the HAP protein, plasmepsin 4, phytepsin, xylanase inhibitor, basic 7S globulin, and EDGP, have geometrical parameters that formally exceed those of a canonical hydrogen bond [61].

Ligand Bound Pepsin-like Proteins
Section 2.2.3 identifies seven amino acids of the pepsin's SCC that are responsible for ligand recognition.These are (1, 2, 3 and 4) catalytic Asp/Gly pairs of (Asp-Thr/Ser-Gly) N and (Asp-Thr/Ser-Gly) C , C-terminal and N-terminal Asp-Thr/Ser-Gly motifs; (5 and 6) two C-terminal serine residues of D-loop N and D-loop C ; and (7) the Tyr 189 , the central residue of the tripeptide DD-link C .Of the thirteen pepsin-like representative structures listed in Table 1, only seven had a complex with a ligand close to or within the SCC.Six of these seven structures had similar D-loop/ligand contacts (Table S3).And, again, the HAP protein was unique, by lacking the expected contacts of Ala 217 and Ser 219 with the K95 inhibitor as seen in all of the other structures.With the HAP protein, instead of those contacts, Ala 217 and Ser 219 of chain_A formed hydrogen bonds with Asn 279 of chain_B, i.e., O/Ala 217_A -N/Asn 279_B at 2.9 Å and OG/S 219 -ND2/N 279_B at 3.1 Å, respectively, and a weak hydrogen bond with Glu 278A of chain_B (designated as Glu 278A_B in the PDB file of 3QVI), O/Ala 217_A -CA/Glu 278A_B at 3.4 (2.6) 127 • (for the definition of parameters of weak hydrogen bonds, see [15]).The changes in contact partners for Ala 217 and Ser 219 are due to the fact that in the inhibitor complex the enzyme forms a tight domain-swapped dimer, not previously seen in any aspartic protease [36].As a result of such domain-swapped dimerization, Glu 278A of chain_B forms contacts with the inhibitor instead of Ala 217 and Ser 219 of chain_A (Table S3, row 4f and column 5).
Taken together, the pepsin-like family proteins from Table 1 have their SCC constructed from the same set of conserved amino acids in all three forms, i.e., proenzyme, ligandfree enzyme, and ligand-bound enzyme, while the most noticeable structural changes concern the transition of the DD-links and fireman's grips from the zymogenic form to the enzymatic form.The DD-zones include the N-terminal and C-terminal D-loops, D-loop N and D-loop C , with their ends linked by the longer DD-link N and a water molecule, and a shorter DD-link C plus a mediator molecule (Figure 1A).

DD-Zones
The retroviral protease (retropepsin) family is the second family of acid proteases listed in Table 1.Hydrolases of this family do not have a zymogenic form, and the enzyme is a dimer of two identical amino acid chains.Figure 4A shows a DD-zone of HIV-1 protease (PDB ID: 3IXO).The main differences between the DD-zones of pepsin and HIV-1 are the number of residues forming DD-links and an absence of mediators.

Psi-Loops in HIV-1 and XMRV
As noted above, a homodimer of two identical amino acid chains is the active form of a HIV-1 protease.Therefore, one can expect the conformation of the psi-loop motif in chains A and B to be identical.It was found out that HIV-1 and XMRV not only have similar psi-loop motifs, but they are also similar to that observed in the C-domain of pepsin (Figures 1C and 4C).That is, the identical psi-loops in HIV-1 and XMRV have chosen A change in the number of residues in the DD-links is usually associated with the presence or absence of the need to form a β-structural contact with either the propeptide or the N-terminal fragment (Figure 4A vs. Figure 1A).However, a decrease in the length of the DD-link by one amino acid does not necessarily lead to a change in the relative position of the D-loops relative to each other.Such is the case for the HIV-1 protease, where atoms of the long side-chain of Arg 8 (DD-link in HIV-1) interact with Asp 29 (D-loop in HIV-1) instead of the oxygen atoms of the shorter side-chains of Asp 11 (DD-link in pepsin) and Ser 219 (D-loop in pepsin) (Figure 4A vs. Figure 1A, Table S1).
In the XMRV protease (PDB ID: 3NR6), there is glutamate (DD-link in XMRV) in place of Arg 8 (DD-link in HIV-1) and glutamine (D-loop in XMRV) instead of Asp 29 (D-loop in HIV-1) (Table 1), which results in some changes in the architecture of the DD-zone in the XMRV protease compared to HIV-1 (Figure 4B, Table S1).In XMRV, there is an increase in the distance between the ends of the DD-link and the D-loop, which results in the absence of a direct contact between them.However, in XMRV, the D-loop/DD-link contact happens through the mediator residue Arg 95 , which also participates in the formation of the psi-loop (Figure 4B).
Thus, the distinctive feature of the retroviral protease (retropepsin) family hydrolases is within the DD-zones, where the D-loops are bound by short DD-links of two residues plus a mediator residue.Additionally, in HIV-1 and XMRV, there is a separate residue Arg 87 (in HIV-1)/Arg 95 (in XMRV), which interacts with Asp 29 (in HIV-1)/Gln 36 (in XMRV) via a conventional hydrogen bond: NH2/R 87 -OD1/D 29 (Table S1, column 5), and stabilizes the conformation of the D-loop.The function of this residue in HIV-1 and XMRV is unknown.

Psi-Loops in HIV-1 and XMRV
As noted above, a homodimer of two identical amino acid chains is the active form of a HIV-1 protease.Therefore, one can expect the conformation of the psi-loop motif in chains A and B to be identical.It was found out that HIV-1 and XMRV not only have similar psi-loop motifs, but they are also similar to that observed in the C-domain of pepsin (Figures 1C and 4C).That is, the identical psi-loops in HIV-1 and XMRV have chosen a conformation that provides a catalytic aspartate with higher proteolytic efficiency in both subunits (Table S2).In Table S2, homodimer chains A and B in HIV-1 (and other retroproteases) are listed as the respective counterparts of the N-and C-domains in pepsin, but this is an arbitrary assignment.

Ligand-Bound Forms of Retroviral Proteases
The DD-zones of ligand-bound pepsin and HIV-1 are very similar to each other (Figures 2B and 4D).The main interactions are made by the three amino acids from each of the two D-loops, totaling six interacting residues (Table S3).In HIV-1, these residues are Asp 25 , Gly 27 , and Asp 29 from D-loop of chain A and, of course, identical residues are in D-loop of chain_B of the HIV-1 homodimer (Figure 4D).For comparison, in pepsin, those amino acids are Asp 32 , Gly 34 , and Ser 36 from D-loop N and Asp 215 , Gly 217 , and Ser 219 from D-loop C (Table S3).In addition, with pepsin, Section 2.2.3 describes the additional Tyr 189 from the DD-link C that is involved in contacts with the ligand.In the ligand-bound HIV-1 protease (PDB ID: 5YOK), a combination of Arg 8 (DD-link)/Asp 29 (D-loop) performs an analogous role.Similar to HIV-1, in the ligand-bound XMRV (PDB ID: 3SLZ), the C-terminal position of the D-loop, Gln 36 , also participates in ligand binding (Table S3, last column).Replacing Asp 29 (in HIV-1) with Gln 36 (in XMRV) also results in additional hydrogen bonds formed between XMRV and the inhibitor.Interaction with the ligand does not seem to affect the architecture of the DD-zone in the HIV-1 and XMRV proteases (Table S1).
The X-ray structure of the retroviral HIV-1 protease (Figure 4D) shows an identical mode of interaction between two catalytic aspartates, Asp 25 of chain_A and _B, and the bound ligand.However, if we take into account additional neutron crystallography data, we find that the catalytic aspartates are not identical in terms of their protonation state [62,63].According to these data, one aspartate is protonated and the other is deprotonated at physiological pH.As a result, the two catalytic aspartates do interact differently with the same ligand.The deprotonated aspartate uses one of its deprotonated side-chain oxygens to interact with the hydrogen bound to the O2 atom of the ligand.At the same time, the protonated aspartate uses its protonated side-chain oxygen to interact directly with the same O2 oxygen atom of the ligand.These additional experimental data show the different roles that these two aspartates play in the catalytic mechanism of the HIV-1 protease.
The SCCs of the HIV-1 and XMRV proteases are shown in Figure 5A,B.
ygens to interact with the hydrogen bound to the O2 atom of the ligand.At the same time, the protonated aspartate uses its protonated side-chain oxygen to interact directly with the same O2 oxygen atom of the ligand.These additional experimental data show the different roles that these two aspartates play in the catalytic mechanism of the HIV-1 protease.
The SCCs of the HIV-1 and XMRV proteases are shown in Figure 5A,B.The location of the structural catalytic core (SCC) in the 3D structure of HIV-1 protease is shown in Figure 6.The location of the structural catalytic core (SCC) in the 3D structure of HIV-1 protease is shown in Figure 6.

SCCs of the Dimeric Aspartyl Proteases and Lpg0085-like Family Proteins
In HIV-1 and XMRV, we have shown how amino acid changes at the N-terminus of the DD-link and the C-terminus of the D-loop affect the structure of the DD-zone.The Ddi1 protease, like the XMRV protease, has glutamine as the C-terminal amino acid of the D-loop (Tables 1 and S1, rows 16c and 16d).However, the DD-links of the Ddi1 and XMRV proteases differ in length.In Ddi1, the number of amino acids in the DD-link increases twofold (from 2 to 4 residues) compared to XMRV protease, while in Lpg0085 the DD-link is a single residue (Figure 7A,B; Tables 1 and S1, rows 18c and 18d).To compensate for such a reduction in the DD-link length in Lpg0085, a mediator dipeptide Arg147-Asp148 is additionally present for DD-zone formation.Thus, the DD-zones of the dimeric aspartyl proteases and the Lpg0085-like proteins are characterized by the presence of either a longer DD-link of four residues or a shorter DD-link of one residue plus a separate two-   1 and S1, rows 16c and 16d).However, the DD-links of the Ddi1 and XMRV proteases differ in length.In Ddi1, the number of amino acids in the DD-link increases twofold (from 2 to 4 residues) compared to XMRV protease, while in Lpg0085 the DD-link is a single residue (Figure 7A,B; Tables 1 and S1, rows 18c and 18d).To compensate for such a reduction in the DD-link length in Lpg0085, a mediator dipeptide Arg 147 -Asp 148 is additionally present for DD-zone formation.Thus, the DD-zones of the dimeric aspartyl proteases and the Lpg0085-like proteins are characterized by the presence of either a longer DD-link of four residues or a shorter DD-link of one residue plus a separate two-residue mediator.As in the case of retroviral proteases, Ddi1 and Lpg0085 use the psi-loop C motif, which is equivalent to the C-terminal version of the psi-loop motif in pepsin-like family proteins (Tables 1 and S2, rows 16c, 16d, 18c and 18d).The ApRick protease does not form a canonical dimer, as do Ddi1 and Lpg0085 [25].However, the psi-loop in the ApRick protease monomer is still identical to that in Ddi1 and Lpg0085 (Figure 5C; Tables 1 and S2, row 17c).Li et al. suggested that the ApRick protease "may represent a putative common ancestor of monomeric and dimeric aspartic proteases" [25]

Conclusions
Here, we have outlined the minimal conserved structural arrangement common to the acid protease superfamily of proteins, which we refer to as the structural catalytic core (SCC).We began with the pepsin-like family proteases, where we defined the DD-zone (Figure 1A).The DD-zone is a circular structural motif defined by substructures around the catalytic aspartates in the N-and C-terminal domains, D-loop N and D-loop C , and their interactions with the peptides DD-link N and DD-link C , which join the ends of D-loop N and D-loop C .Then, we increased the common substructure by defining the psi-loop N and psi-loop C motifs, where the DD-zone interacts through their D-loops with two external tetrapeptides, G-loop N and G-loop C , the residues of which intersect with the Hydrophobic-Hydrophobic-Gly sequence motif [51] (Figure 1B,C).While the two psi-loop motifs use the same logic in their formation, they differ in the environment around the catalytic aspartates, which may determine their different functional roles.Taken together, the psi-loops and the DD-zone define structural boundaries of the SCC in pepsin-like proteins.
The other families of acid proteases, retroviral proteases (retropepsin), dimeric aspartyl proteases, and Lpg0085-like proteins, also have the DD-zone and psi-loop substructures similar to pepsin.However, unlike pepsin-which can be very roughly described as a "hetero psi-loop" protein, where psi-loop N and psi-loop C are not structurally identical unlike the homodimer enzymes, with the psi-loop C being more functionally active-the retroviral proteases, dimeric aspartyl proteases, and Lpg0085-like proteins can be described as having a "homo psi-loop" since they have two identical chains.The homo psi-loops are both structurally similar to psi-loop C of pepsin.As with the pepsin-like proteases, the other three protein families use DD-links to form a DD-zone (Table 1).If a DD-link is equal to or shorter than two amino acids, there are additional mediator residues or water molecules filling the gap.Some mediator residues are located in sequence either at the C-terminus of the G-loop or immediately after it.Based on the structures seen so far, we can argue that a specific "long DD-link", or "DD-link + mediator" or "DD-link + water" combination, is the same for a structural family within an acid protease superfamily, and may distinguish that family from the other proteins.
In summary, we can say that the SCC of the acid protease superfamily proteins consists of a dimer composed of a DD-link, D-loop, and G-loop blocks, where the D-loop plus DDlink forms a DD-zone, and the dimer of D-and G-loops forms two psi-loops.Defining the SCC in this way allows us to outline a minimal common substructure for the entire superfamily of proteins, such as acid proteases.This substructure combines amino acid conservation and protein functionality, which together can be used for protein comparison, structure identification, protein family separation, and protein engineering.

Figure 1 .
Figure 1.Three building blocks of the structural catalytic core (SCC) in propepsin (PDB ID: 3PSG), as a representative member of the pepsin-like family of the acid protease superfamily.(A) DD-zone, (B) psi-loopN, and (C) psi-loopC.The dashed lines show long-range hydrogen bonds between the bordering amino acids of fragments of the primary structure of the protein: D-loops, DD-link, mediator, and G-loops, thus determining the cyclic nature and composition of the residues of each block separately.A dimer of dipeptides, Asp32-Thr33 and Asp215-Thr216, from two D-loops, form the fireman's grip in the DD-zone, which is characterized by four long-range hydrogen bonds, while tetrapeptides, Asp32-...-Ser35 and Asp215-...-Thr218, from two D-loops, form the Asx-motif in psi-loopN and psi-loopC, which is characterized by two short-range hydrogen bonds.Structural differences in two long-range hydrogen bonds located within psi-loopN (O/Asp32-N/Leu123 and (O/Ser35-N/Ala124) and psi-loopC (O/Thr218-N/Asp303 and O/Ser219-N/Val304) influence the functional differences between the catalytic aspartates.

Figure 1 .
Figure 1.Three building blocks of the structural catalytic core (SCC) in propepsin (PDB ID: 3PSG), as a representative member of the pepsin-like family of the acid protease superfamily.(A) DD-zone, (B) psi-loop N , and (C) psi-loop C .The dashed lines show long-range hydrogen bonds between the bordering amino acids of fragments of the primary structure of the protein: D-loops, DD-link, mediator, and G-loops, thus determining the cyclic nature and composition of the residues of each block separately.A dimer of dipeptides, Asp 32 -Thr 33 and Asp 215 -Thr 216 , from two D-loops, form the fireman's grip in the DD-zone, which is characterized by four long-range hydrogen bonds, while tetrapeptides, Asp 32 -...-Ser 35 and Asp 215 -...-Thr 218, from two D-loops, form the Asx-motif in psi-loop N and psi-loop C , which is characterized by two short-range hydrogen bonds.Structural differences in two long-range hydrogen bonds located within psi-loop N (O/Asp 32 -N/Leu 123 and (O/Ser 35 -N/Ala 124 ) and psi-loop C (O/Thr 218 -N/Asp 303 and O/Ser 219 -N/Val 304 ) influence the functional differences between the catalytic aspartates.

Figure 2 .
Figure 2. Interface organization of interactions between the SCC of pepsin and the ligand saq (A) A smooth coil representation is shown that passes through the CA atom positions of the SCC.The dashed lines show the complete set of long-range hydrogen bonds between the b residues of the six amino-acid sequence fragments.(B) The potential hydrogen bonding int between the D-loops of the DD-zone and saquinavir are shown with dashed lines.

Figure 2 .
Figure 2. Interface organization of interactions between the SCC of pepsin and the ligand saquinavir.(A) A smooth coil representation is shown that passes through the CA atom positions of the pepsin's SCC.The dashed lines show the complete set of long-range hydrogen bonds between the bordering residues of the six amino-acid sequence fragments.(B) The potential hydrogen bonding interactions between the D-loops of the DD-zone and saquinavir are shown with dashed lines.

Figure 3 .
Figure 3.The 3D structure of the active site in pepsin-like family aspartic proteases.The thre show the location of the structural catalytic core (SCC) in propepsin (PDB ID: 3PSG_A).It of a DD-zone (a central rectangle constructed using dotted lines) and two psi-loops (solid lin discussed structural elements (loops and links) are highlighted and labeled.

Figure 3 .
Figure 3.The 3D structure of the active site in pepsin-like family aspartic proteases.The three boxes show the location of the structural catalytic core (SCC) in propepsin (PDB ID: 3PSG_A).It consists of a DD-zone (a central rectangle constructed using dotted lines) and two psi-loops (solid lines).The discussed structural elements (loops and links) are highlighted and labeled.

Figure 4 .
Figure 4.The building blocks of the SCC in the HIV-1 and XMRV homodimer proteases (PDB IDs: 3IXO and 3NR6, correspondingly), as the representative members of the retroviral protease (retropepsin) family of the acid protease superfamily.(A) DD-zone of HIV-1 protease, (B) DD-zone of XMRV protease, and (C) psi-loop of HIV-1 protease.(D) The potential hydrogen bonding interactions (dashed lines) between two identical D-loops of the DD-zone and the ligand in the HIV-1 protease with inhibitor KNI-1657 complex (PDB ID: 5YOK).

Figure 4 .
Figure 4.The building blocks of the SCC in the HIV-1 and XMRV homodimer proteases (PDB IDs: 3IXO and 3NR6, correspondingly), as the representative members of the retroviral protease (retropepsin) family of the acid protease superfamily.(A) DD-zone of HIV-1 protease, (B) DD-zone of XMRV protease, and (C) psi-loop of HIV-1 protease.(D) The potential hydrogen bonding interactions (dashed lines) between two identical D-loops of the DD-zone and the ligand in the HIV-1 protease with inhibitor KNI-1657 complex (PDB ID: 5YOK).

Figure 5 .
Figure 5. SCC of (A) HIV-1 and (B) XMRV proteases.A smooth coil representation is used in the figures, which passes through the CA atom of SCC positions of the corresponding retroviral proteases.The SCC of the XMRV protease differs from the SCC of the HIV-1 protease by the inclusion of the mediator residue Arg95 from the G-loop in each monomer.

Figure 5 .
Figure 5. SCC of (A) HIV-1 and (B) XMRV proteases.A smooth coil representation is used in the figures, which passes through the CA atom of SCC positions of the corresponding retroviral proteases.The SCC of the XMRV protease differs from the SCC of the HIV-1 protease by the inclusion of the mediator residue Arg 95 from the G-loop in each monomer.

Figure 6 .
Figure 6.The 3D structure of the active site in retroviral protease (retropepsin) family aspartic proteases.The three boxes show the location of the structural catalytic core (SCC) in HIV-1 protease (PDB ID: 3IXO_A, B).It consists of a DD-zone (a central rectangle constructed using dotted lines) and two psi-loops (solid lines).The discussed structural elements (loops and links) are highlighted and labeled.

Figure 6 .
Figure 6.The 3D structure of the active site in retroviral protease (retropepsin) family aspartic proteases.The three boxes show the location of the structural catalytic core (SCC) in HIV-1 protease (PDB ID: 3IXO_A, B).It consists of a DD-zone (a central rectangle constructed using dotted lines) and two psi-loops (solid lines).The discussed structural elements (loops and links) are highlighted and labeled.

2. 5 .
SCCs of the Dimeric Aspartyl Proteases and Lpg0085-like Family Proteins In HIV-1 and XMRV, we have shown how amino acid changes at the N-terminus of the DD-link and the C-terminus of the D-loop affect the structure of the DD-zone.The Ddi1 protease, like the XMRV protease, has glutamine as the C-terminal amino acid of the D-loop (Tables

Molecules 2024 , 18 Figure 7 .
Figure 7.The building blocks of the SCC in the Ddi1 protease, Lpg0085 protein, and ApRick protease (PDB IDs: 4Z2Z, 2PMA and 5C9F, correspondingly), as the representative members of the dimeric aspartyl protease and LPG0085-like families of the acid protease superfamily.(A) DD-zone of Ddi1 protease, (B) DD-zone of protein Lpg0085, and (C) psi-loop of ApRick protease.

Figure 8 .
Figure 8. SCC of (A) Ddi1 protease and (B) protein Lpg0085.The main differences between the SCCs of the two proteins are the amino acid composition of the DD-links and the use of a mediator-dipeptide in the structural formation of the DD-zone in the protein Lpg0085.

Figure 7 .
Figure 7.The building blocks of the SCC in the Ddi1 protease, Lpg0085 protein, and ApRick protease (PDB IDs: 4Z2Z, 2PMA and 5C9F, correspondingly), as the representative members of the dimeric aspartyl protease and LPG0085-like families of the acid protease superfamily.(A) DD-zone of Ddi1 protease, (B) DD-zone of protein Lpg0085, and (C) psi-loop of ApRick protease.
. The SCCs in Ddi1 and Lpg0085 are shown in Figure 8A,B.

Figure 7 .
Figure 7.The building blocks of the SCC in the Ddi1 protease, Lpg0085 protein, and ApRick protease (PDB IDs: 4Z2Z, 2PMA and 5C9F, correspondingly), as the representative members of the dimeric aspartyl protease and LPG0085-like families of the acid protease superfamily.(A) DD-zone of Ddi1 protease, (B) DD-zone of protein Lpg0085, and (C) psi-loop of ApRick protease.

Figure 8 .
Figure 8. SCC of (A) Ddi1 protease and (B) protein Lpg0085.The main differences between the SCCs of the two proteins are the amino acid composition of the DD-links and the use of a mediator-dipeptide in the structural formation of the DD-zone in the protein Lpg0085.

Figure 8 .
Figure 8. SCC of (A) Ddi1 protease and (B) protein Lpg0085.The main differences between the SCCs of the two proteins are the amino acid composition of the DD-links and the use of a mediatordipeptide in the structural formation of the DD-zone in the protein Lpg0085.

Table 1 .
Structural amino acid alignment of the structural catalytic core (SCC) in the acid proteases superfamily proteins.

Table 1 . Cont N PDB ID and Chain R(Å) Protein EC: Number Propept. or N-Term Pept. DD-Link D-Loop G-Loop Mediator Ref.
N/A-Not Available.